label query vs membership query
Active Learning of General Halfspaces: Label Queries vs Membership Queries
We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces under the Gaussian distribution on \mathbb{R} d in the presence of some form of query access. In the classical pool-based active learning model, where the algorithm isallowed to make adaptive label queries to previously sampled points, we establish a strong information-theoretic lower bound ruling out non-trivialimprovements over the passive setting. Specifically, we show thatany active learner requires label complexity of \tilde{\Omega}(d/(\log(m)\epsilon)), where m is the number of unlabeled examples. Specifically, to beat the passive label complexity of \tilde{O}(d/\epsilon), an active learner requires a pool of 2 {\mathrm{poly}(d)} unlabeled samples.On the positive side, we show that this lower bound can be circumvented with membership query access, even in the agnostic model. Here p \in [0, 1/2] is the bias and \mathrm{opt} is the 0-1 loss of the optimal halfspace. As a corollary, we obtain a strong separation between the active and membership query models.